Agricultural Climate Change¶

Chimzurumoke Orji¶

12/10/2021¶

Introduction¶

Climate change stands as one of the most discussed topics in the current news. Although it's since turned into a buzzword for you not being a "decent person" if you don't make changes to your personal life based on climate change, the vast majority of climate change stems from companies, farms, and transportation. As shown during the beginning of the COVID-19 pandemic when everyone engaged in the quarantine inside their own homes, the global footprint barely decreased.

Even though there is a big conversation about the switch from gas appliances and vehicles towards the electrical counterpart to reduce emissions, I instead wanted to explore the agricultural side of climate change.

In this project I will use data from Food and Agriculture Organization of the United Nations to conduct research. This project is to help share a better understanding on how agriculture affects our planet with two questions.

How has the temperature changed throughout the years?

Can we predict future emissions?

Packages Used¶

In [1]:
import pandas as pd #used for matplotlib graphs
import numpy as np #used for arrays to make dictionaries
import seaborn as sns #used for data visualization
%matplotlib inline 
#used to see plots in jupyter notebook


import chart_studio.plotly as py #used for chloropleth maps
import plotly.graph_objs as go 
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.express as px

Importing Global Temperature Change Data for Research¶

Source: https://www.fao.org/faostat/en/#data/ET

In [2]:
df_temp = pd.read_csv('FAOSTAT_temp_change.csv')
df_temp.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 162300 entries, 0 to 162299
Data columns (total 14 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   Domain Code       162300 non-null  object 
 1   Domain            162300 non-null  object 
 2   Area Code (FAO)   162300 non-null  int64  
 3   Area              162300 non-null  object 
 4   Element Code      162300 non-null  int64  
 5   Element           162300 non-null  object 
 6   Months Code       162300 non-null  int64  
 7   Months            162300 non-null  object 
 8   Year Code         162300 non-null  int64  
 9   Year              162300 non-null  int64  
 10  Unit              162300 non-null  object 
 11  Value             156681 non-null  float64
 12  Flag              162300 non-null  object 
 13  Flag Description  162300 non-null  object 
dtypes: float64(1), int64(5), object(8)
memory usage: 17.3+ MB

The data names and meaning in use for this study, Domain Code - the data set code name, Domain - what the data set pertains to, Area Code (FAO) - the country code, Area - the country name, Element Code - what the data set is calculating(either temperature change or standard deviation), Element - specifies temperature change, Months Code - code for the month, Months - shows the month name, Year Code - shows the year code, Year - shows the year name, Unit - shows which unit of measurement is used, Value - shows the temperature of both the country and year, Flag - shows the flag description code, Flag Description - shows the flag for the row(Fc: Calculated Data, NV: Data not available, NA: Not applicable)

In [3]:
df_temp.head()
Out[3]:
Domain Code Domain Area Code (FAO) Area Element Code Element Months Code Months Year Code Year Unit Value Flag Flag Description
0 ET Temperature change 2 Afghanistan 7271 Temperature change 7001 January 1961 1961 °C 0.746 Fc Calculated data
1 ET Temperature change 2 Afghanistan 7271 Temperature change 7001 January 1962 1962 °C 0.009 Fc Calculated data
2 ET Temperature change 2 Afghanistan 7271 Temperature change 7001 January 1963 1963 °C 2.695 Fc Calculated data
3 ET Temperature change 2 Afghanistan 7271 Temperature change 7001 January 1964 1964 °C -5.277 Fc Calculated data
4 ET Temperature change 2 Afghanistan 7271 Temperature change 7001 January 1965 1965 °C 1.827 Fc Calculated data

Exploring the Dataset¶

This data set consists of temperature variablitly ranging from several countries around the world, dating back from 1961. Looking over the data set, there isn't much to clean other than the year code due to its redundancy.

In [4]:
df_temp.drop('Year Code', axis=1, inplace=True)
df_temp.describe()
Out[4]:
Area Code (FAO) Element Code Months Code Year Value
count 162300.000000 162300.0 162300.000000 162300.000000 156681.000000
mean 130.647689 7271.0 7006.500000 1991.306248 0.493085
std 76.809078 0.0 3.452063 17.333268 1.114205
min 1.000000 7271.0 7001.000000 1961.000000 -9.303000
25% 64.000000 7271.0 7003.750000 1976.000000 -0.103000
50% 131.000000 7271.0 7006.500000 1992.000000 0.417000
75% 194.000000 7271.0 7009.250000 2006.000000 1.031000
max 351.000000 7271.0 7012.000000 2020.000000 11.759000

Since I can already see that the minimum increase and the maximum increase are largely separated, along with the average estimated at a consistent growth, I chose to use a choropleth map to visualize the steep inclination.

In [5]:
fig=px.choropleth(df_temp,locations="Area", #used for the countries
locationmode="country names",animation_frame="Year", #tells the syntax that these are countries
animation_group="Area",color="Value", #uses the 'Year' column as video and uses the 'Value' column to show the variance
color_continuous_scale=["#E0FFFF", "#FF0000"] , hover_name="Area", #chose the color range from blue to red indicating temperature change
title = "Global Temperature Change")

fig.show()

Even though there were some errors due to the data set not having the correct syntax in the values making some countries blue instead of red, overall, the choropleth map shows how the globe increased in temperature over the years.

This indication of temperature change proves that further observation is needed to understand today's climate.

Importing Emissions Totals Data for Research¶

Source: https://www.fao.org/faostat/en/#data/GT

In [6]:
df_emm = pd.read_csv('Emissions_Totals.csv')
df_emm.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13949 entries, 0 to 13948
Data columns (total 17 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Domain Code       13949 non-null  object 
 1   Domain            13949 non-null  object 
 2   Area Code (FAO)   13949 non-null  int64  
 3   Area              13949 non-null  object 
 4   Element Code      13949 non-null  int64  
 5   Element           13949 non-null  object 
 6   Item Code         13949 non-null  int64  
 7   Item              13949 non-null  object 
 8   Year Code         13949 non-null  int64  
 9   Year              13949 non-null  int64  
 10  Source Code       13949 non-null  int64  
 11  Source            13949 non-null  object 
 12  Unit              13949 non-null  object 
 13  Value             13949 non-null  float64
 14  Flag              13949 non-null  object 
 15  Flag Description  13949 non-null  object 
 16  Note              0 non-null      float64
dtypes: float64(2), int64(6), object(9)
memory usage: 1.8+ MB

The data names and meaning in use for this study, Area Code - the country code, Area - the country name, Item Code - the code name for emission sources, Item - the title of the emission source, Element Code - the code for the emission elements, Element - the name of the element emitted, Year Code - shows the year code, Year - shows the year name, Source Code - the code for source used for the data, Source - the title of the source used for data, Unit - the measurement type used in this data set, Value - the amount recorded, Flag - shows the flag description code, Note - additional information about each row

In [7]:
df_emm.head()
Out[7]:
Domain Code Domain Area Code (FAO) Area Element Code Element Item Code Item Year Code Year Source Code Source Unit Value Flag Flag Description Note
0 GT Emissions Totals 2 Afghanistan 7225 Emissions (CH4) 5058 Enteric Fermentation 2019 2019 3050 FAO TIER 1 kilotonnes 389.6563 Fc Calculated data NaN
1 GT Emissions Totals 2 Afghanistan 724413 Emissions (CO2eq) from CH4 (AR5) 5058 Enteric Fermentation 2019 2019 3050 FAO TIER 1 kilotonnes 10910.3754 Fc Calculated data NaN
2 GT Emissions Totals 2 Afghanistan 723113 Emissions (CO2eq) (AR5) 5058 Enteric Fermentation 2019 2019 3050 FAO TIER 1 kilotonnes 10910.3754 Fc Calculated data NaN
3 GT Emissions Totals 2 Afghanistan 7225 Emissions (CH4) 5059 Manure Management 2019 2019 3050 FAO TIER 1 kilotonnes 26.1252 Fc Calculated data NaN
4 GT Emissions Totals 2 Afghanistan 7230 Emissions (N2O) 5059 Manure Management 2019 2019 3050 FAO TIER 1 kilotonnes 0.3654 Fc Calculated data NaN

Exploring the Dataset¶

This data set consists of varying emission types and amounts based on each country per year. Other than the repeating year column and note column, there isn't much else data to clean for now.

In [8]:
df_emm.drop('Year Code', axis=1, inplace=True)
df_emm.drop('Note', axis=1, inplace=True)
df_emm.describe()
Out[8]:
Area Code (FAO) Element Code Item Code Year Source Code Value
count 13949.000000 13949.000000 13949.000000 13949.0 13949.0 13949.000000
mean 130.726145 387474.273568 13503.999642 2019.0 3050.0 1276.922239
std 76.730232 357616.933583 20592.883945 0.0 0.0 21896.693851
min 1.000000 7225.000000 5058.000000 2019.0 3050.0 -651765.300100
25% 65.000000 7230.000000 5062.000000 2019.0 3050.0 0.000300
50% 129.000000 723113.000000 6750.000000 2019.0 3050.0 1.247100
75% 194.000000 724313.000000 6993.000000 2019.0 3050.0 88.975500
max 351.000000 724413.000000 69921.000000 2019.0 3050.0 653402.772000

Since there are several types of emissions along with multiple sources of said emissions, I need to filter each emission type to get a better estimate of the mean, standard deviation, minimum and maximum numbers.

In [9]:
df_emm_ch4 = df_emm[df_emm['Element Code']==7225] #CH4 Emmissions (Methane)
df_emm_ch4.describe()
Out[9]:
Area Code (FAO) Element Code Item Code Year Source Code Value
count 1856.000000 1856.0 1856.000000 1856.0 1856.0 1856.000000
mean 131.440733 7225.0 14021.238147 2019.0 3050.0 87.703295
std 76.694013 0.0 21047.855179 0.0 0.0 600.519128
min 1.000000 7225.0 5058.000000 2019.0 3050.0 0.000000
25% 66.000000 7225.0 5060.000000 2019.0 3050.0 0.000000
50% 130.000000 7225.0 6795.000000 2019.0 3050.0 0.142450
75% 195.000000 7225.0 6993.000000 2019.0 3050.0 7.191150
max 351.000000 7225.0 69921.000000 2019.0 3050.0 14053.658400
In [10]:
df_emm_n2o = df_emm[df_emm['Element Code']==7230] #N2O Emmissions (Nitrous Oxide)
df_emm_n2o.describe()
Out[10]:
Area Code (FAO) Element Code Item Code Year Source Code Value
count 2173.000000 2173.0 2173.000000 2173.0 2173.0 2173.000000
mean 130.722964 7230.0 15546.099862 2019.0 3050.0 4.385206
std 76.847692 0.0 22925.319603 0.0 0.0 25.818491
min 1.000000 7230.0 5059.000000 2019.0 3050.0 0.000000
25% 65.000000 7230.0 5062.000000 2019.0 3050.0 0.001500
50% 129.000000 7230.0 5066.000000 2019.0 3050.0 0.096700
75% 194.000000 7230.0 6994.000000 2019.0 3050.0 1.222300
max 351.000000 7230.0 69921.000000 2019.0 3050.0 614.678800
In [11]:
df_emm_co2 = df_emm[df_emm['Element Code']==7273] #CO2 Emmissions (Carbon Dioxide)
df_emm_co2.describe()
Out[11]:
Area Code (FAO) Element Code Item Code Year Source Code Value
count 992.000000 992.0 992.000000 992.0 992.0 992.000000
mean 131.002016 7273.0 13329.879032 2019.0 3050.0 1326.107959
std 76.921337 0.0 18674.642952 0.0 0.0 51278.042131
min 1.000000 7273.0 6750.000000 2019.0 3050.0 -651765.300100
25% 65.000000 7273.0 6751.000000 2019.0 3050.0 0.000000
50% 129.000000 7273.0 6993.000000 2019.0 3050.0 0.000000
75% 194.000000 7273.0 6994.000000 2019.0 3050.0 434.604650
max 351.000000 7273.0 67292.000000 2019.0 3050.0 653402.772000

Enteric Fermentation - Digestive process by which carbohydrates are broken down by micro organisms into simple molecules for absorption into the bloodstream of an animal. Greenhouse gas emissions from enteric fermentation consist of methane gas.

Manure Management - Refers to capture, storage, treatment, and utilization of animal manure. Greenhouse gas emissions from manure management consist of methane and nitrous oxide gases from aerobic and anaerobic manure decomposition processes.

Rice Cultivation - Agricultural practice for growing rice seeds. Greenhouse gas emissions from rice cultivation consist of methane gas from the anaerobic decomposition of organic matter in paddy fields.

Synthetic Fertilizers - Inorganic material of synthetic origin added to a soil to supply one or more plant nutrients essential to the growth of plants. Greenhouse gas emissions from synthetic fertilizers consist of the addition of nitrous oxide gas to managed soils.

Manure applied to Soils - Animal waste distributed on fields in amounts that enrich soils. Greenhouse gas emissions from manure applied to soils consist of nitrous oxide gas from manure added to managed soils.

Manure left on Pasture - Animal waste left on managed soils from grazing livestock. Greenhouse gas emissions from manure left on pasture consist of nitrous oxide gas.

Crop Residues - Agriculture management practice that consists in returning to managed soils the residual part of the produce. The associated greenhouse gas emissions are nitrous oxide gas from crop residues’ decomposition.

Burning - Crop residues - Agriculture management practice that consists in the combustion of a percentage of crop residues burnt on-site. Greenhouse gas emissions from burning crop residues are methane and nitrous oxide gases.

Net Forest conversion - The net forest conversion is calculated as the difference of forest area for two consecutive years, consistently with IPCC approach 1. The term “net” indicates that no further specification on the underlying dynamics of the computed land area change is possible. Greenhouse gas emissions consist of the net contribution of CO2 sources and sinks due to deforestation, reforestation and afforestation activities within countries.

In [12]:
fig=px.choropleth(df_emm,locations="Area", #used for the countries
locationmode="country names",animation_frame="Item", #switches from source of emission
animation_group="Area",color="Value", #uses the 'Value' column to show the variance
color_continuous_scale='reds' , hover_name="Area", #chose the range of red indicating volume
hover_data={'Element'}, title = "Emissions Totals for 2019")

fig.show()

Set up the Data for the Random Forest Classification Model¶

Checking .info() again to see which column are objects

In [13]:
df_emm.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13949 entries, 0 to 13948
Data columns (total 15 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Domain Code       13949 non-null  object 
 1   Domain            13949 non-null  object 
 2   Area Code (FAO)   13949 non-null  int64  
 3   Area              13949 non-null  object 
 4   Element Code      13949 non-null  int64  
 5   Element           13949 non-null  object 
 6   Item Code         13949 non-null  int64  
 7   Item              13949 non-null  object 
 8   Year              13949 non-null  int64  
 9   Source Code       13949 non-null  int64  
 10  Source            13949 non-null  object 
 11  Unit              13949 non-null  object 
 12  Value             13949 non-null  float64
 13  Flag              13949 non-null  object 
 14  Flag Description  13949 non-null  object 
dtypes: float64(1), int64(5), object(9)
memory usage: 1.6+ MB

For this part of the research I won't be needing columns that doesn't help with the prediction

In [14]:
df_emm.drop('Domain Code', axis=1, inplace=True)
df_emm.drop('Domain', axis=1, inplace=True)
df_emm.drop('Area Code (FAO)', axis=1, inplace=True)
df_emm.drop('Element Code', axis=1, inplace=True)
df_emm.drop('Item Code', axis=1, inplace=True)
df_emm.drop('Year', axis=1, inplace=True)
df_emm.drop('Source Code', axis=1, inplace=True)
df_emm.drop('Source', axis=1, inplace=True)
df_emm.drop('Unit', axis=1, inplace=True)
df_emm.drop('Flag', axis=1, inplace=True)
df_emm.drop('Flag Description', axis=1, inplace=True)

Categorical Features¶

Since Area, Element, and Item are all categorical objects, these columns need transforming numerical variables for sklearn to understand them.

In [15]:
cat_feats = ['Area', 'Element', 'Item']
In [16]:
final_data = pd.get_dummies(df_emm, columns = cat_feats, drop_first = True)
In [17]:
final_data.head()
Out[17]:
Value Area_Albania Area_Algeria Area_American Samoa Area_Andorra Area_Angola Area_Anguilla Area_Antigua and Barbuda Area_Argentina Area_Armenia ... Item_Forest fires Item_Forestland Item_Manure Management Item_Manure applied to Soils Item_Manure left on Pasture Item_Net Forest conversion Item_On-farm energy use Item_Rice Cultivation Item_Savanna fires Item_Synthetic Fertilizers
0 389.6563 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 10910.3754 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 10910.3754 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 26.1252 0 0 0 0 0 0 0 0 0 ... 0 0 1 0 0 0 0 0 0 0
4 0.3654 0 0 0 0 0 0 0 0 0 ... 0 0 1 0 0 0 0 0 0 0

5 rows × 259 columns

Train Test Split¶

Here I will spilt the data into a training set and a testing set for predictions. For this project I chose Enteric Fermentation to train and split.

In [18]:
from sklearn.model_selection import train_test_split #used to split data
In [19]:
X = final_data.drop('Item_Enteric Fermentation',axis = 1)

y = final_data['Item_Enteric Fermentation']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3 , random_state= 50)

Training a Decision Tree Model¶

This part is to set up the decision tree for later predictions

In [20]:
from sklearn.tree import DecisionTreeClassifier #to create an instance and fit it with the data
In [21]:
dtree = DecisionTreeClassifier()
In [22]:
dtree.fit(X_train, y_train)
Out[22]:
DecisionTreeClassifier()

Predictions and Evaluation of Decision Tree¶

Now to setup predictions and a confusion matrix

In [23]:
y_predict = dtree.predict(X_test)
In [24]:
from sklearn.metrics import confusion_matrix, classification_report
In [25]:
print(classification_report(y_test, y_predict))
              precision    recall  f1-score   support

           0       0.99      0.99      0.99      4004
           1       0.67      0.67      0.67       181

    accuracy                           0.97      4185
   macro avg       0.83      0.83      0.83      4185
weighted avg       0.97      0.97      0.97      4185

All the initial scores (precision, recall, and f1-score) were at 99% The second iteration scores at 67% in precision, recall, and f1-score. The average scores are at 83%. This means that overall it was a good prediction.

In [26]:
print(confusion_matrix(y_test, y_predict))
[[3944   60]
 [  59  122]]

The confusion matrix shows that there are 3,944 True Positive(TP), 60 False Negative(FN), 59 False Positives(FP), and 122 True Negatives(TN). There is more than double true negatives compared to either the false negatives or false positives.

In summary, agriculture is a huge component of environmental change because of the measure of GHG discharges created inside the homestead and on agrarian land. With the selection of Enteric Fermentation to train and split I found that the data set works better with the Random forest classification model compared to the single-descision tree.

References¶

Tubiello, F.N. 2019. Greenhouse Gas Emissions Due to Agriculture. In: Ferranti, P., Berry, E.M., Anderson, J.R. (Eds.), Encyclopedia of Food Security and Sustainability, vol. 1, pp. 196–205. Elsevier. ISBN: 9780128126875.

Prosperi, P., Bloise, M., Tubiello, F.N., Conchedda, G., Rossi, S., Boschetti, L., Salvatore, M. & Bernoux, M. 2020. New estimates of greenhouse gas emissions from biomass burning and peat fires using MODIS Collection 6 burned areas. Climatic Change 1–18.

Conchedda, G. and Tubiello, F.N. 2020. Drainage of organic soils and GHG emissions: Validation with country data. Earth System Science Data Discussions 2020, 1–47. https://doi.org/10.5194/essd-2020-202.

Tubiello, F. N., G. Conchedda, N. Wanner, S. Federici, S. Rossi, and G. Grassi. 2021. Carbon Emissions and Removals from Forests: New Estimates, 1990–2020. Earth System Science Data 13 (4): 1681–1691. https://doi.org/10.5194/essd-13-1681-2021.

Tubiello, F. N., Rosenzweig, C., Conchedda, G., Karl, K., Gütschow, J., Xueyao, P., Obli-Laryea, G., Wanner N., Yue Qiu S., De Barros J., Flammini A., Mencos-Contreras E., Souza L., Quadrelli R., Halldórudóttir Heiðarsdóttir H., Benoit P., Hayek M. and Sandalow D. 2021. Greenhouse Gas Emissions from Food Systems: Building the Evidence Base. Environmental Research Letters 16 (6): 065007. https://doi.org/10.1088/1748-9326/ac018e.

In [ ]: